Skip to content

Conversation

@rchateauneu
Copy link
Contributor

This is the same code change as in #1317 but on a brand new branch and #1300

The performance test is make with parsing:

g.parse("orkg.nt", format="nt")

The general idea is to use a single regular expression to parse the line into its three or four individual components.
nquads parsing reuse much more ntriples codes than before.

Windows:
14.71s => 12.20 : 17% faster
Linux: 13.12 => 12.45: 5% faster.

Please note that about 15% of time is spent in codecs.readline(), implying UTF-8 conversions due to opening the file in "rb" mode. handling only str instead of bytes would easily speed things up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant